Vector-Neuron Models of Associative Memory 
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Abstract — We consider two models of Hopfield-like associa- 
tive memory with q- valued neurons: Potts-glass neural network 
(PGNN) and parametrical neural network (PNN). In these models 
neurons can be in more than two different states. The models 
have the record characteristics of its storage capacity and noise 
immunity, and significantly exceed the Hopfield model. We 
present a uniform formalism allowing us to describe both PNN 
and PGNN. This networks inherent mechanisms, responsible for 
outstanding recognizing properties, are clarified. 

I. Introduction 

The number of patterns that can be stored in Hopfield model 
(HM) is comparatively not so large. If N is binary neurons 
number, then the thermodynamic approach leads to the well- 
known estimation of the HM storage capacity, Phm ~ 0.14-JV 
(UL [2])- At the early 90-th some authors suggested Hopfield- 
like models of associative memory with q-valued neurons 
that can be in more than two different states, q > 2, [3]- 
[10]. All these models are related with the Potts model of 
magnetic. The last one generalizes the Ising model for the 
case of the spin variable that takes q > 2 different values 
[11], [12]. In all these works the authors used the same well- 
known approach linking the Ising model with the Hopfield 
model (see, for example, [2]). Namely, in place of the short- 
range interaction between two nearest spins the Hebb type 
interconnections between all g-valued neurons were used. As 
a result, long-range interactions appear. Then in the mean-field 
approximation it was possible to calculate the statistical sum 
and, consequently, to construct the phase diagram. Different 
regions of the phase diagram were interpreted in the terms of 
the network ability to recognize noisy patterns. 

For all these models, except one, the storage capacity is 
even less than that for HM. An exception is so named Potts- 
glass neural network (PGNN) [3]. The numerical solution of 
transcendential equation system resulting from thermodynamic 
approach leads to the following estimation for storage capacity 
for PGNN 
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As far as q-valued models are intended for color images 
processing, number q stands for number of different colors, 
used for elementary pixel can be painted. Even if q ~ 10 the 
storage capacity of PGNN is 50 times as much as the storage 
capacity of HM. For computer processing of colored images 



the standard value is q = 256. Consequently, comparing with 
HM the gain is about four orders, ppgnn ~ 10 4 • Phm- It 
is very good result. However, for long time it was not clear, 
why PGNN has such a big storage capacity. Thermodynamic 
approach does not answer this question. 

On the other hand we worked out the model of associative 
memory, intended for implementation as an optical device 
([13], [14]). Such a network is capable to hold and handle 
information that is encoded in the form of the frequency- 
phase modulation. In the network the signals propagate along 
interconnections in the form of quasi-monochromatic pulses 
at q different frequencies. There are arguments in favour of 
this idea. First of all, the frequency-phase modulation is more 
convenient for optical processing of signals. It allows us to 
back down an artificial adaptation of an optical network to 
amplitude modulated signals. Second, when signals with q 
different frequencies can propagate along one interconnection 
this is an analog of the channel multiplexing. In fact, this 
allows us to reduce the number of interconnections by a factor 
of q 2 . Note that interconnections occupy nearly 98% of the 
area of neurochips. 

In the center of our model the parametrical four-wave 
mixing process (FWM) is situated, that is well-known in 
nonlinear optics [15]. However, in order this model has good 
characteristics, an important condition must be added that 
should facilitate the propagation of useful signal, and, in 
the same time, suppress internal noise. This condition is the 
principle of incommensurability of frequencies proponed in 
[13], [14] in nonlinear optics terms (see Sec. 3). 

The signal-noise analysis of our model made with the aid 
of the Chebyshev-Chernov statistical method [16], [17] showed 
that the storage capacity of the network was approximately q 2 
times as much as the HM storage capacity. We called our 
network the parametrical neural network (PNN). 

We worked out the vector formalism - universal description 
of PNN, not related directly to the optical model [18] -[20]. 
This formalism proved to be useful also for clear description 
of PGNN, although initially it was formulated in absolutely 
another terms. In this way one can easily establish relations 
between PGNN and PNN and also clarify the mechanisms, 
responsible for outstanding recognizing properties of both 
models. The reason is the local architecture of both networks, 
which suppresses system internal noise. In other g-valued 



models there is no such suppression. 

In this paper we give PGNN description, using the vector 
formalism. Then we define our PNN, using nonlinear optics 
terms and the vector formalism as well. Moreover, we consider 
some possible architectures for PNN. 

Note. Our vector formalism is almost identical to the vector- 
neuron approach, which was suggested some years ago by 
[21]. We have found this paper after working out our own 
vector formalism. Dynamical rule in [21] was formulated not 
in the best way, however it seems, that the authors of [21] 
were the first to suggest the fruitful idea about representation 
of interconnections matrix as tensor product of vector-neurons. 

II. POTTS-GLASS NEURAL NETWORK 

We describe PGNN in terms of our vector formalism and 
in future compare it with PNN. 

A. Vector formalism 

PGNN consists of N neurons each of which can be in q 
different states. In order to describe the q different states of 
neurons we use the set of of g-dimensional vectors of a special 
type, so named Potts vectors. Namely, the Ith state of a neuron 
is described by a column-vector d/ G R q , 
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, l = l,...,q. 
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The state of the i-th neuron is described by a vector x^ = 
dz; , 1 < k < q. The state of the network as a whole 
X is determined by a set of N column-vectors x*: X = 
(xi, . . . , xjv). The p stored patterns are 

— (, x l ,---, x Afj, x i — 

1 < 4 M) < 9, /i = l,2,...,p. 
Since neurons are vectors, the local field hi affecting the 
ith neuron is a vector too, 

N 



The (q xq) -matrices Tjj describe the interconnections between 
the ith and the jth neurons. By analogy with the Hopfield 
model these matrices are chosen in generalized Hebb form: 

T ij = (l-^)^x^xW + , i,j = l,...,N, (1) 

where x + is ^-dimensional row-vector and Sij is the Kro- 
necker symbol. The matrix affects the vector Xj € R q , 
converting it in a linear combination of column-vectors d;. 
After summation over all j we get the local field as linear 
combination of vectors d; 



Let k be the index relating to the maximal coefficient: > 
A^f 1 V I. Then, by definition, the i-th neuron at the next time 
step, t + 1, is oriented along a direction mostly close to the 
local field hi at the time t: 



Xj(i + 1) = d k . 



(2) 



The evolution of the system consists of consequent changes 
of orientations of vector-neurons according to the rule (2). 
We make the convention that if some of the coefficients A^ 
are maximal simultaneously, and the neuron is in one of 
these unimprovable states, its state does not change. Then 
it is easy to show that during the evolution of the network 
its energy H(t) = — 1/2 X^Li(hj(*) x iW) decreases. In the 
end the system reaches a local energy minimum. In this state 
all the neurons Xj are oriented in an unimprovable manner, 
and the evolution of the system come to its end. These 
states are the fixed points of the system. The necessary and 
sufficient conditions for a configuration X to be a fixed point 
is fulfillment of the set of inequalities: 

(x 4 h 4 ) > (d,hi), Vl = l,...,q; Vi = l,...,JV. (3) 

When g = 2, PGNN is the same as the standard Hopfield 
model. 

B. Storage capacity of PGNN 

Let we have the randomized patterns {X^}\. Suppose that 
the network starts from a distorted mth pattern 

= (6ix( ro) ,6 2 xj™\ 
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The noise operator bj with the probability b changes the state 
of the vector xj m , and with the probability 1 — 6 this vector 
remains unchanged. In other words, b is the probability of 
an error in a state of a neuron. The noise operators bj are 
independent, too. 

The network recognizes the reference pattern X m correctly, 
if the output of the ith neuron defined by Eq.(2) is equal to 
x{ m \ Otherwise, PGNN fails to recognize the pattern X m . 
Let us estimate the probability of error in the recognition of 
mm pattern. 

Simple calculations show, that probability of inequality 
validity (x| m) h 4 ) < (d z h 4 ) at d ( ^ x< m) can be expressed 
as 

{, JV 1 N p ] 
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(4) 



(xfVf)). 

The quantity £ is the useful signal. It is connected with 
influence of exactly the mth pattern onto the ith neuron. The 
partial random variables ^ are independent and identically dis- 
tributed. The quantity rj symbolizes the inner noise, connected 
with distorting influence of all other patterns. Partial noise 



components rff^ are independent and identically distributed. 
It is easy to obtain the distributions for £j and jj^: 
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(5) 



Let us pay attention on the fact, that at q » 1 the noise 
component rj^ is localized mainly in zero: 

Probjr^ =0} = (g-2)/g~l. 

Total random variables £, 77 are asymptotic normal dis- 
tributed with parameters 



E(0 = !1 1 1 -b, E( V ) = 0, 

9 3 



D(0 - 0; 
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(6) 



where as usual the loading parameter a = j^. Now the 
probability of recognition error of coordinate x-" 1 ^ can be 
calculated by integration of the area under the "tail" of 
normally distributed r], where 77 > E(£). Here we can explain, 
why the storage capacity of PGNN is much larger than HM. 

The same considerations we can are valid for HM. It is done 
for example in [2]. Again we obtain a useful signal £ and an 
internal noise 77, and Eq. (4) for the probability of recognition 
failure. Again these random quantities will asymptotic normal 
as sums of independent, identically distributed partial random 
components and T]j^- T ne distributions of these last com- 
ponents can be obtained from Eq.(5) at q = 2 (because PGNN 
transforms into HM in this case). Mean values and dispersions 
for £ and -q can be obtained from (6) in the same way. As the 
result we have for HM: 



1/2, l-b u) I 1/2, 1/2 

-1/2, b ' 'i \ -1/2, 1/2 

E(0 =1 2- b > E M = 0, 

D(0 - 0; D( V ) = f . 



(7) 



Comparison of (7) with (5) and (6) demonstrates, that the 
dispersion of internal noise for PGNN is much smaller, than 
that for HM: 

D PG nn(v)/ d hm(v) = ^ 3 — << 1, when q » 1. 

Already at q ~ 10 the internal noise dispersion for PGNN is 
an order of magnitude smaller, than that for HM. Moreover, 
at q <~ 10 2 the fall of the dispersion is four orders of 
magnitude! This defines PGNN superiority over HM. We will 
give explanation of mechanism of internal noise compression 
in PGNN in the following Section. 

Switching from one vector-coordinate situation to that 
with the whole pattern and using the standard approximation 



([19], [20]) we obtain the expression for the probability of the 
error in the recognition of the pattern X^ m \ 



Pr 



v^>(-£^(i-5) a 



b = 



9-1 



b. 



(8) 

The expression sets the upper limit for the probability of recog- 
nition failure for PGNN. Then, the asymptotically possible 
value of the storage capacity of PGNN is 

N q(q-l) 



Pc 



(9) 
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When q = 2, these expressions give the known estimates for 
HM. For q > 2 the storage capacity of PGNN is q(q - l)/2 
times as large as the storage capacity of HM. In [3] the 
same factor was obtained by fitting the results of numerical 
calculations. We obtain the same result rigorously. 

III. Parametrical neural network 

Here we describe our associative memory model both in 
nonlinear optics and vector-formalism terms. We also will set 
out the obtained results for this model. 

A. Nonlinear optic formulation 

In the network the signals propagate along interconnections 
in the form of quasi-monochromatic pulses at q different 
frequencies 



{Ull}\ = {Ul,U>2, —,UJ q }. 



(10) 



The model is based on a parametrical neuron that is a cubic 
nonlinear element capable to transform and generate frequen- 
cies in the parametrical FWM-processes Ui — ojj + Uk —> w r . 
Schematically this model of a neuron can be assumed as 
a device that is composed of a summator of input signals, 
a set of q ideal frequency filters {uJi} q , a block comparing 
the amplitudes of the signals and q generators of quasi- 
monochromatic signals {u>i\ q . 

Let {K^}\ be a set of patterns each of which is a set 
of quasi-monochromatic pulses with frequencies defined by 
Eq.(10) and amplitudes equal to ±1: 



KM 



±CXp(lW ;(M )7j), 



(11) 



/i = l,...,p; i = l,...,N; 1<IT' <q. 

The memory of the network is localized in interconnections 
Tij, i,j = l,...,N, which accumulate the information about 
the states of ith and jth neurons in all the p patterns. We 
suppose that the interconnections are dynamic ones and that 
they are organized according to the Hebb rule: 



(1-S, 



v 



(a0„0*)* 



'-./ 1 V. (12) 



The network operates as follows. A quasi-monochromatic 
pulse with a frequency ui^ that is propagating along the 
(ij)-th interconnection from the jth neuron to the ith one, 
takes part in FWM-processes with the pulses stored in the 
interconnection, — + ojl — > The amplitudes 



±1 have to be multiplied. Summing up the results of these 
partial transformations over all patterns, p = l,...,p, we 
obtain a packet of quasi-monochromatic pulses, where all the 
frequencies from the set (10) are present. This packet is the 
result of transformation of the pulse by the interconnection 
Tij, and it comes to the ith neuron. All such packets are 
summarized in this neuron. The summarized signal propagates 
through q parallel ideal frequency filters. The output signals 
from the filters are compared with respect to their amplitudes. 
The signal with the maximal amplitude activates the i-th neu- 
ron ('winner-take-all'). As a result it generates an output signal 
whose frequency and phase are the same as the frequency and 
the phase of the activating signal. 

Generally, when three pulses interact, under a FWM-process 
always the fourth pulse appears. The frequency of this pulse is 
defined by the conservation laws only. However, in order that 
the abovementioned model works as a memory, an important 
condition must be add, which has to facilitate the propagation 
of the useful signal, and, in the same time, to suppress external 
noise. This condition is the principle of incommensurability of 
frequencies proponed in [13], [14]: no combinations u>i —oop + 
u>i" can belong to the set (10), when all the frequencies are 
different. 

Now we finished to describe the principle of the network 
operating. This network will be called the parametrical neural 
network (PNN). Here an important remark has to be done. 

Generally speaking, there are different parametrical FWM- 
processes complying with the principle of incommensurability 
of frequencies. However, better results can be obtained for the 
parametrical FWM-process 



(Jl, when I' = I"; 
— > 0, in other cases. 



This architecture will be called PNN-2 (another architecture, 
PNN-1, was examined in [13], [14]). Here we investigate the 
abilities of PNN-2. The structure of the rest of the paper is as 
follows. In next subsection we introduce a vector formalism 
allowing us to formulate the problem in the general form. 
Then, the results for PNN-2 will be presented. Then we 
mention shortly about other neuro-architectures, based on 
PNN-2. Some remarks are given in Conclusions. 

B. Vector formalism for PNN-2 

In order to describe the q different states (10) of neurons 
we use the set of basis vectors e ; in the space R q , q > 1, 

/ \ 



1 
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l = l,...,q. 



The state of the ith neuron is described by a vector Xj, 



1 < h < q; 



= Xie h , Xi = ±1, e h eR q A . _ ^ 



i N. 



(14) 



The factor Xi denotes the signal phase. The state of the 
network as a whole X is determined by a set of N q- 
dimensional vectors x,: X = (xi, . . . , xjv). The p stored 
patterns are 

XW = (xWx(")..,xW), x^=^e r , 
x^=±l, 1<^ } <<Z, /i = l,...,p, 
and the local field is 

1 N 

l> y^ t " x '- (15) 



i=i 



The (q x <7)-matrix describes the interconnection between 
the ith and the jth neurons. This matrix affects the vector 
Xj e R q , converting it in a linear combination of basis vectors 
e;. This combination is an analog of the packet of quasi- 
monochromatic pulses that come from the jth neuron to the 
ith one after transformation in the interconnection. To satisfy 
the conditions (12) and (13), we need to take the matrices Tij 
as 



T^(l-<%)£x^ + , i,j = l,. 



,7V. (16) 



Note, that the structure of this expression is similar to that of 
(1). 

The dynamic rule is left as earlier: the ith neuron at the time 
t+1 is oriented along a direction mostly close to the local field 
hj(t). However the expressions will differ from (2). Indeed, 
with the aid of (16) we write in the form more convenient 
for analysis: 



(13) h i (t) = J2A^e l , A] 



N p 



^ z ^(e,xl" ) )(^x J -(t)). 



1=1 



(17) 

Let k be the index relating to the amplitude that is maximal 



l>l A\ 



(0 



V I. Then 



(18) 



in modulus in the series (17): | A k 
according to our definition, 

Xi(t+1) =sgn(4; ) )e fe . 

The evolution of the system consists of consequent changes 
of orientations of vector-neurons according to the rule (18). 
The necessary and sufficient conditions for a configuration X 
to be a fixed point is fulfillment of the set of inequalities: 

(xih<) >| (ezh,) |, V/ = l,...,q; V* = l,...,N, 

(compare with Eq.(3)). 

C. Storage capacity of PNN-2 

All these considerations are identical to those for PGNN. 
Differences appear only because of neurons are defined now 
not only by vectors, but also by scalars ±1. The distorted mth 
pattern has the form 

Here {ai}^ and {bi}^ define a phase noise and a frequency 
noise respectively: aj is a random value that is equal to —1 



or +1 with the probabilities a and 1 — a respectively; b is the 
probability that the operator 6j changes the state of the vector 
x-" 1 ^ = x| m ' l e ; ( m ) , and 1 — b is the probability that this vector 
remains unchanged. 

The amplitudes (17) have the form 

Jm) t-^N t. i T p r>^ /_/<». 



~ < 



(AO 



,(m) 



where = ^(e^OCx}" M^), - ^(x^x^), 
j(^ i) = 1, . . . , TV, (U(^ to) = l,...,p. When the patterns 
{Jf^)}^ are uncorrected, the quantities £j and are 
independent random variables described by the probability 
distributions 



+1, 


(1 


-b)(l-a) 


o, 


b 




-1 


(1 


-b)a 



» 



+1, l/2 9 2 
0, 1-1/V 
-1 l/2q 2 



(compare with Eq.(5)). As in the case of PGNN, when q >> 1 
the noise component rf^ is localized mainly in zero: 

Prob {77^ = 0| = 1 - 1/V - 1. 
Eq.(6) now will transform into: 



E(£) = (1 - 2o)(l - b), E( V )=0, 



D(0 - 0; 



D{n) 



When g >> 1 the dispersion of internal noise for PNN-2 is 
even smaller, than for PGNN: 

D PN n{v)/ d pgnn{v) = V 2 > when 1 » !■ 

In the long run this determines the superiority of PNN-2 over 
PGNN in memory capacity and noise immunity. It is conve- 
nient here to mention mechanisms, suppressing internal noises. 
They are identical in both models, but we will demonstrate 
them on the PNN example. 

When signal propagates it interacts with frequencies, stored 
in interconnection — w,<>) + — > . In addition 

the principal of frequencies incommensurability (13) should 
be fulfilled. It can be formulated in vector terms as: 

1 o! 



when lj' = lj; 
in other cases. 



One can see from the last equation, that the largest part of 
propagated signals will be suppressed. It happens because 
the interconnection chooses the only one combinations of 
indices and lj from all possible ones, where indices 
coincide (other combinations give zero). In other words, the 
interconnection filters signals. It is the main reason of the 
largest part of internal noise i] is localized in zero. 

The similar filtration happens also in PGNN. The difference 
is that in PGNN the signal always propagates through the 
interconnection. But when indices l^ and lj coincide, the 
signal is attributed with large positive amplitude ~ 1. If indices 
do not coincide, the signal is attributed with small negative 



amplitude ~ — 1/q. This signal filtration leads to suppression 
of internal noise in PGNN. In all another g-valued models of 
associative memory this filtration is absent. 

At the end of consideration of PNN-2 we give the expres- 
sions for noise immunity and storage capacity similar to (8) 
and (9): 



Pr err ~ V^exp ( - N{l J a? ■ q 2 {l bf 



2p 



Pc = 



N(l-2a) 2 
21-aN 



■q\l-b)\ 



(19) 



(20) 



When q = 1, Eqs.(19)-(20) transform into well-known 
results for the standard Hopfield model (in this case there is no 
frequency noise, b = 0). When q increases, the probability of 
the error (19) decreases exponentially, i.e. the noise immunity 
of PNN increases noticeably. In the same time the storage 
capacity of the network increases proportionally to q 2 . In 
contrast to the Hopfield model the number of the patterns p 
can be much greater than the number of neurons. 

For example, let us set a constant value Pr err = 0.01. In 
the Hopfield model, with this probability of the error we can 
recognize any of p = N/10 patterns, each of which is less then 
30% noisy. In the same time, PNN-2 with q = 64 allows us to 
recognize any of p — 5N patterns with 90% noise, or any of 
p = 50N patterns with 65% noise. Our computer simulations 
confirm these results. 

The memory capacity in PNN-2 is twice as large as that in 
PGNN. Evidently, it is connected with the fact, that for the 
same q the number of different states of neurons in PNN-2 is 
twice as large as that in PGNN. In general, both models have 
very similar characteristics. 

D. Other PNN-architectures 

1 ) Phase-independent PNN-3: When the PNN is realized 
as a device, the problem arises, that one should control 
the phases of all signals. All phases should be matched. It 
is rather difficult problem. It seems, that the easiest way 
to overcome this difficulty is to make all phases identical. 
Formally, we should make all amplitudes ±1 in (11) and (14) 
to 1. More precise analysis shows, that in this case partial 
noise components become not independent. The noise 
dispersion drastically increases. The way out is to use specially 
chosen vector thresholds in the local field definition [22]: 



(a0 



(21) 



where matrices Tjj are determined by Eq.(16). Then the 



partial noise components 77J 



become uncorrected. And it 
is possible to apply the probability-theoretic approach for 
estimation of signal/noise ratio. 

Means and dispersions of total random variables £ and 77 are 
the same as in expressions (6). But whole phase-independent 
PNN (we called it as PNN-3) is equivalent to PGNN. If to 
compare with PNN-3-model PGNN is too complicated. It is 



related with using the Potts vectors d; instead of basis vectors 
e;. Being realized as a computer algorithm PNN-3 works q 
times quicker than PGNN. 

2) Decorrelating PNN: We suggested the method of suffi- 
cient enlarging of binary associative memory with the help of 
PNN-architecture for the case of correlation between patterns 
([23], [24]). As it is known the memory capacity of Hopfield 
model falls down drastically if there are correlations, so the 
only way out is so named sparse coding [25] -[29]. Our method 
is an alternative to this approach. 

At the heart of our approach is one-to-one mapping of 
binary patterns into internal representation, using vector- 
neurons of large dimension, q >> 1. Then PNN is being 
constructed on the basis of obtained vector-neuron patterns. 
The representation has the following properties: i) correla- 
tions between vector-neuron patterns become negligible; ii) 
dimension q of vector-neurons increases exponentially as a 
function of mapping parameter. The larger a dimension q the 
better recognition properties of PNN. The result of exponential 
increase of q leads to the exponential increase of binary 
memory capacity. 

The mapping of binary patterns into vector-neuron ones is 
based on the very clear idea. This idea resembles the method, 
which was used previously in sparse coding ([30]), where due 
to a redundant coding it was possible to increase the storage 
capacity comparing with the Hopfield model. In the same time 
the noise immunity of the system was very low. In our case the 
redundancy of coding is absent, the storage capacity increases 
drastically, and the noise immunity is much greater. In future 
we plan to compare PNN with sparse coding in details. 

IV. Conclusions 

From the early 90th the intensity of g-valued neural net- 
works researches sharply decreased. Presumably it can be 
explained by absence of progress in development of effective 
models of associative memory. Computer algorithm of PNN- 
architecture demonstrates, that we approach to those magni- 
tudes of storage capacity and noise immunity which could be 
of interest for practical applications. Use of PNN-architectures 
seems to us very promising. 
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